some suggestions #112

TITC · 2022-04-02T08:18:44Z

normalization influenced by alpha.

grayscale = (data[..., 0]-data[..., 0].min()) / (data[..., 0].max()-data[..., 0].min())*255

paste size does not match

padded.paste(im, (0, 0, im.size[0], im.size[1]))

pad wrong pixel when text is inverted
cause the text has inverted some times, but the padded pixel is hard code to 255

padded = Image.new('L', dims, 255)

I notice this will cause error recognition when the text's pixel is 255 and the pad pixel is also 255, then that pad part will be recognized as text.

1. normalization influenced by alpha. ```python grayscale = (data[..., 0]-data[..., 0].min()) / (data[..., 0].max()-data[..., 0].min())*255 ``` 2. paste size does not match ``` padded.paste(im, (0, 0, im.size[0], im.size[1])) ``` 3. pad wrong pixel when text is inverted cause the text has inverted some times, but the padded pixel is hard code to 255 ```python padded = Image.new('L', dims, 255) ``` I notice this will cause error recognition when the text's pixel is 255 and the pad pixel is also 255, then that pad part will be recognized as text.

lukas-blecher · 2022-04-03T14:19:02Z

Thanks for the contribution.
I have a couple of comments:

The normalization influenced by the alpha channel:
You are correct. The line before was not working quite as intended, but right now there is a bigger problem because images with black font and a transparent background are only zero in the first channel. All information is in the alpha cannel so the line you proposed would result in an error.
With this in mind, I moved some code up to solve the issue all together. The result is an array with only one channel.
I don't understand what the difference is. Can this be related to images do not match error in the pad function? #76
It is quite important that the padding pixel is 255. If the text is white the image is inverted beforehand so that there is always a black font on white ground.

I have implemented these changes and pushed them to your branch (49480af). Feel free to comment.
This PR also solves the same problem as #113 so I'll close that one.

TITC · 2022-04-03T15:11:10Z

Thanks for your attention. @lukas-blecher

The normalization influenced by the alpha channel
for the with black font and a transparent background case, I removed the transparent channel at suport RGBA #113 (comment).
I used the below image for the test and passed.
paste size does not match
As the doc mentioned, Calculates the bounding box of the non-zero regions in the image. I noticed that the getbbox api returned size is smaller than im.size sometimes. unfortunately, my WSL has broken this evening, so I can't reproduce it right now.
padding pixel is 255

so that there is always a black font on white ground.

Here is input image

and here is the padded image, the line at the bottom doesn't exist in the original picture but shows after padded. As I think, this line is padded in the middle process.

here are more middle images, you can associate prefix names with variables.

if the above information is not enough, I will add more after WSL is repaired.😁

I encountered these problems at those images if you do not mind, you can test them before I repaired my WSL.

lukas-blecher · 2022-04-03T15:33:05Z

Thanks for the detailed comment.
I don't get the black line when padding the image above.
If I input the same image as you, the downsampled image in the end is the following

Maybe it's some leftover code on your side?

TITC · 2022-04-03T15:48:16Z

Thanks for your reply.

not exclusive that possibility, I will roll back to this version to control unrelated variables. It's too late in China, I will give your reply as soon as possible tomorrow.

BTW, could you teach me how to make such a brilliant formula recognition project from scratch, please. Any advice is welcome, I am new to this area, and want to learn from you.

TITC · 2022-04-03T17:07:08Z

You are right.

I wrote a dozy line without any meaning

data = np.stack((grayscale, grayscale), axis=-1)

it causes rect[...,-1] not equal to 0 but it should be.

and then pixel inverted again.

im = Image.fromarray((255-rect[..., -1]).astype(np.uint8)).convert('L')

I think the main reason for this misjudges is that I do not really understand the main logic behind your code, could you give me any advice?

lukas-blecher · 2022-04-04T12:51:05Z

I'm happy to give any insight if you have an specific question.
I don't know what to to tell you regarding the main logic.
Sorry for the lack of comments in the code. I understand that it is difficult to see the purpose sometimes.

TITC · 2022-04-04T13:23:55Z

Glad to hear your reply.

Your code is much better than my coworkers. My point is that my confusion comes from the knowledge hamper.

here are what I have

some knowledge learned from Andrew Ng's deep learning curriculum.
some working experience with NLP in Chinese.
kind of familiar with semantic segmentation and object detection, and both have some project experience.
Read some papers, like Attention Is All You Need, BERT, Unet, Mnet and so on.
familiar with API in TensorFlow, Keras, and PyTorch.
linear algebra learned from MIT Gilbert Strang from Youtube
Further mathematics
Probability and statistical

here are what I want to know

Could you recommend some papers related to this repo? except below the ones I am reading.
How can I get started in this area? Any tutorial you recommend is welcome. I am totally new to formula recognition. And I do not want to be a person who just knows how to call an API. I want to go deeper.
How can I make a small demo by myself from scratch, any knowledge else I need to know?

TITC · 2022-04-04T13:34:22Z

In fact, I want to make a handwriting formula recognition project, and I notice your todo list contains this part. Could there any possibilities that permit me to be a collaborator on this project and learn from you.😁

lukas-blecher · 2022-04-04T15:38:00Z

I sadly don't have any more papers for you. When I started this project the ViT was freshly proposed and I wanted to make a formula recognition model. I can tell you though it is helpful to have a CNN backbone in the encoder.

Regarding the handwriting project: I see you already noticed the colab notebook I linked in the README: https://colab.research.google.com/drive/1ba_qCGJl29dFQqfBjdqMik3o_EqPE4fr
There I outlined how to finetune the model on both rendered and handwritten formulas.
I just didn't have the time yet to fully train the models.

TITC · 2022-04-04T23:51:58Z

I sadly don't have any more papers for you. When I started this project the ViT was freshly proposed and I wanted to make a formula recognition model. I can tell you though it is helpful to have a CNN backbone in the encoder.

Regarding the handwriting project: I see you already noticed the colab notebook I linked in the README: https://colab.research.google.com/drive/1ba_qCGJl29dFQqfBjdqMik3o_EqPE4fr There I outlined how to finetune the model on both rendered and handwritten formulas. I just didn't have the time yet to fully train the models.

Thanks for your advice. Look forward to your better results.

TITC and others added 2 commits April 2, 2022 16:17

fix problems with alpha channel

49480af

lukas-blecher mentioned this pull request Apr 3, 2022

suport RGBA #113

Closed

lukas-blecher merged commit 08053ab into lukas-blecher:main Apr 3, 2022

TITC changed the title ~~fix bugs~~ some suggestion Apr 3, 2022

TITC changed the title ~~some suggestion~~ some suggestions Apr 3, 2022

TITC deleted the patch-1 branch April 4, 2022 14:09

lukas-blecher mentioned this pull request Apr 8, 2022

images do not match error in the pad function? #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

some suggestions #112

some suggestions #112

Uh oh!

TITC commented Apr 2, 2022 •

edited

Loading

Uh oh!

lukas-blecher commented Apr 3, 2022

Uh oh!

TITC commented Apr 3, 2022 •

edited

Loading

Uh oh!

lukas-blecher commented Apr 3, 2022

Uh oh!

TITC commented Apr 3, 2022 •

edited

Loading

Uh oh!

TITC commented Apr 3, 2022

Uh oh!

lukas-blecher commented Apr 4, 2022

Uh oh!

TITC commented Apr 4, 2022 •

edited

Loading

Uh oh!

TITC commented Apr 4, 2022 •

edited

Loading

Uh oh!

lukas-blecher commented Apr 4, 2022

Uh oh!

TITC commented Apr 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

some suggestions #112

some suggestions #112

Uh oh!

Conversation

TITC commented Apr 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukas-blecher commented Apr 3, 2022

Uh oh!

TITC commented Apr 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukas-blecher commented Apr 3, 2022

Uh oh!

TITC commented Apr 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TITC commented Apr 3, 2022

Uh oh!

lukas-blecher commented Apr 4, 2022

Uh oh!

TITC commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TITC commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukas-blecher commented Apr 4, 2022

Uh oh!

TITC commented Apr 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TITC commented Apr 2, 2022 •

edited

Loading

TITC commented Apr 3, 2022 •

edited

Loading

TITC commented Apr 3, 2022 •

edited

Loading

TITC commented Apr 4, 2022 •

edited

Loading

TITC commented Apr 4, 2022 •

edited

Loading